Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: add the file count of specific deletes in the snapshot summary #4677

Merged
merged 1 commit into from
May 2, 2022

Conversation

chenjunjiedada
Copy link
Collaborator

This adds more info for deletes in the snapshot summary.

In the case of V2 tables, users may have offline service to compact deletes. For example, in our production, we analyze the snapshot summary from the commit event to determine whether to start major compaction or minor compaction. The detailed summary info about deletes can help us calculate better compaction timing and resource.

@github-actions github-actions bot added the core label May 1, 2022
Comment on lines 255 to 256
setIf(addedDeleteFiles > 0, builder, ADDED_DELETE_FILES_PROP, addedDeleteFiles);
setIf(removedDeleteFiles > 0, builder, REMOVED_DELETE_FILES_PROP, removedDeleteFiles);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] since now we are tracking added / removed delete files at a more granular level (i.e pos delete added / removed & eq delete added / removed delete files) these metrics can be re-created using them, should we remove or deprecate them from snapshot summary ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating things is quite a bit of work for little gain here. I'd just continue to update them.

@rdblue rdblue merged commit 2cb2a00 into apache:master May 2, 2022
@rdblue
Copy link
Contributor

rdblue commented May 2, 2022

Thanks, @chenjunjiedada!

kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
Iceberg 0.13.0.3 - ADT 1.1.7

2022-05-20

PRs Merged

* Internal: Parquet bloom filter support (apache#594 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/594))
* Internal: AWS Kms Client (apache#630 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/630))
* Internal: Core: Add client-side check of encryption properties (apache#626 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/626))
* Core: Align snapshot summary property names for delete files (apache#4766 (apache/iceberg#4766))
* Core: Add eq and pos delete file counts to snapshot summary (apache#4677 (apache/iceberg#4677))
* Spark 3.2: Clean static vars in SparkTableUtil (apache#4765 (apache/iceberg#4765))
* Spark 3.2: Avoid reflection to load metadata tables in SparkTableUtil (apache#4758 (apache/iceberg#4758))
* Core: Fix query failure when using projection on top of partitions metadata table (apache#4720) (apache#619 (https://github.pie.apple.com/IPR/apache-incubator-iceberg/pull/619))

Key Notes

Bloom filter support and Client Side Encryption Features can be used in this release. Both features are only enabled with explicit flags and will not effect existing tables or jobs.
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants